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ABSTRACT 

A computer-based system for handling transcripts 
describing social behavior, verbal and otherwise, is described in 
this paper. Among the features of note in the system are the 
provision for fairly flexible conventions for noting, in transcripts, 
several kinds of useful data including actor designations, seguencinq 
indicators, and user-defined systematic annotations. (Author) 
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This paper describes some computational aids for handling tran- 
scriptions of verbal and other social interaction » and discusses the 
advantages for analysis which such automation allows. These aids are 
part of a computer-based system called ACTS* for Activity Code and Text 
System, which is being developed for the preparation, storage, retrieval, 
and, eventually, some analysis of transcriptions of human behavior, verbal 
and otherwise. The system was conceived in the course of work on the 
analysis of language behavior in the classroom, and it should be of 
Interest to specialists in educational research, since in this field, 
perhaps mors than in any other area in the social sciences, the descrip- 
tion and analysis of segments of human behavior at it occurs naturally in 
real situation, ha. attracted considerable interest. 

1£ transcript, or other systeaatlc description, of behevior are 
prepared and analysed in other then e cureory or iapressionietic way} 
end particularly if the eaount of date of thie sort ie et ell large, 
the facility of a suitably progressed computer for keeping track of date 
it eae can ba very helpful. Finding ell occurrences of e given word, for 
instance, in even a twenty page transcript can be very tedious for e 
huaen being, and such choree are likely to lead not only to diagruntlaasnt 
but also to arrora. A coaputar, on. the other hand, can per fora such ' 
bookkeeping choree very quickly, end free the huaaa lavaatigatora for 
activities for which they ere aore suited, such aa asking aaaaatic 



Judgements or discovering patterns which depend on cues so subtle that 
computers as we presently understand how to program them are not much 
help. Moreover, once the data are stored in the computer, other kinds 
o f analysis possible with computers such as some kinds of content analysis 
and the examination of some sequential patterns in the data— can be 
performed on them which might be as useful to the researcher as the 
initial "simple" information retrieval. 

In a number of fields in the social sciences, investigators are 
interested in behavior in situ, as it occurs naturally, as contrasted 
with laboratory behavior from which only a few items of data are 
abstracted in the context of acme experimental design. It is in the 
field of educational research where perhaps the most extensive investigation 
of situated behavior has occurred, however. The number of observational 
studies of classroom behavior is quite large, and a good deal of thought 
hen gone into the description of classroom events, making judgements about 
these events, and seeking patterns of these events and relationships among 
them and other variables. The work of Bellack (I960), Flandera (1962), 

L. Smith (1968), B. Smith (1967), and their colleagues la well known. 

The firat edition of Mirrors for Behavior , instruments the compendium 
of work on claaaroom and related interaction by A. Simon and G. Boyer 
was very lengthy; end, perhaps the fact that the amount of work reported 
in the aecond edition (1970) ia approximately double that reported in the 
firat edition (1968), stay indicate that investigations in this area 
continue to be challenging to researchers. 

ACTS was designed specially to facilitate auch research, where 
transcriptions or similar descriptions of behavior are prepared or night 
be prepared, to be consulted in the course of asking judf'/enta, or 



analyzed in and ct themselves* if the analysis of descriptions of the 
stream of ongoing behavior in some situation of interest is to be at all 
fine grained, and if any sizeable amount of interaction is analyzed, then 
the assistance of the computer in this analysis can effect a great 
savings in human effort. Indeed, some kinds of analysis may be done 
when the computer is involved which would be either too complicated or 
too time-consuming otherwise. 

/ 

In it* present stage of development. ACTS allows storage of 
transcript and related data in such a form that the research can retrieve 
this Information flexibly, either for perusal and comparison, or else for 
further automatic processing. It consists of (1) routines for input of 
transcripts, either on punch-cards or using typevriter-llke entry via 
tape cartridges prepared on an IBM HIST; (2) routines for correcting and 
otherwise adjusting the transcripts once they have been entered; (3) pro- 
gramed procedures for segmenting the text and labelling these segments 
according to type of data (word, aentence, actor block, etc.); (4) pro- 
cedures for setting up directories, or maps, which allow tha reaaarcher 
to get to various parte of hie data eaaily, end which can allow linkagaa 
between hie basic date end other data, such as filaa of codad Judgements; 
(5) basic retrieval end output procedures. It ia programmed in PL/1 and 
run on an IBM 340/63 with heavy reliance on disc storage. 

Basic Pats Structuring 

When an investigator wishes to use the fecilitlea of ACTS, he 
must first prepare hie transcripts end related date for entry into the 
computer, taking care to be consistent in such matters as spelling, 
punctuation, end other conventions which may be relevant to machine 
processing. Prom these carefully prepared texts, special representations 



of the transcripts are generated in the computer, which are segmented and 
marked internally in such a way that access to various parts of the data 
can be had in a flexible way* These representations, railed basic text 
files * are at the heart of ACTS* the fact that the basic data are 
structured and marked according to the type of the individual pieces of 
data making up the transcript not only facilitates retrieval, but also 
allows them to enter lnvestlgator-user-supplled analysis routines in a 
systematic way. 

This structuring of the data, which la prerequisite for the 
setting up of directories and other linkage facilities, is performed by 

* 

a central program which uses cue* present in the transcript sa entered 
■ into the computer (such as word boundaries, user-defined special brackets, 
terminating punctuation marks, and so on), together with certain 
specifications supplied by the investigator regarding optional use of 
predetermined configurations of symbols. In perforning this structuring, 
several logically separable tasks are performed! (1) the stream of 
characters, or written symbols, which make up the transcript, it par- 
titioned into stretches of character* which will become th* contents of 
first-order or elementary data- types ; (2) these segments are labelled 
according to the kind of data they repreeent (e.g., apoken word, punctuation 
mark, apaclal Information segment)) and (3) these first-order laballed 
tegmenta enter into higher-order structure*, which indicate the arrange- 
ment of the elementary date-type occurrences (such a* spoken sentence 
representation, special annotation with tag and contents, etc.). A* */ill 
be seen below, tome of these higher-order structures era fairly complex. 



Thus, what initially was just a long string of symbols entering 
the computer one after another becomes a string of structures of segments 
of symbols, where the parts of which are marked for future reference* 
Without some kind of structuring, the transcript would remain just a string 
of individual symbols, and any computation involving this string would 
Involve at least partial restructuring* The same is true, of course, 
for any kind of data entering the computer* 

Most researchers, at this stage of our affluence and technology, 
are familiar with the preparation of numerical or qualitative code data 
for machine processing* Ordinarily, items of data are punched into 
specified positions on IBM cards, with strong constraints on what kind of 
data must go exactly where* (The code letter for sex of respondent, for 
instance, must be punched in column eleven and no where else; otherwise 
It will be lost, or a statistical program may abort trying to compute 
the product of the letter M and a test scora*) For data prepared in the 
above way, the se&mentstion of the symbols on the punchcard, and their 
identification as to type, is performed by the programs which process them 
according to formatting specifications which depend on exact location and 
length of the etrings of eymbols* 

Textual date, on the other hand, cannot conveniently be con- 
strained to euch fixed format specifications since, by its very nature, 
the length and location of its units are variable* Furthermore, the 

* 

identification of typa of data nay depend on fairly coop lax contextual 
cues. ACTS provide# for baalc aagnentation and identification of fraa- 
fornat textual data in a way that allow* the u**r iom latitude in choic* 
of punctuation cuaa, apecial uaea of car tain eynbola, and to on. if ha 



wishes to use the symbol string 1f // M * for instance* to indicate boundaries 
in transcripts of spoken behavior (Garvey* 1970)* he may do sot there is 
no reason that he must be constrained to use M . M * "?"* or M l ,# . ACTS goes 
beyond ordinary text-handling systems also in that it is tailored for 
transcript data* with its peculiarities* rather than being based on the 
simpler graphical system of straight literary text. 

Basic kinds of data . The basic types of data described in this 
paper* and cues for recognizing them* are essentially those outlined in 
Hays (1970). Though they are only a subset of the kinds of structures 
which can be recognized * they represent a fairly manageable package 
designed to handle the structures likely to be useful in working with 
transcripts of classroom behavior and similar situations. 

The Min kinds of data are as follows t 

1" Actor designations . In describing social behavior who la 
* doing the behaving usually must be indicated. An actor designation* in 

ACTS* in an ACTS transcript* la delimited by uaer-dafined special characters* 
and may contain one or more actor identifications • If more than one actor 
la cited as performing some action* lat us say in performing a duet* actor 
mm. «r* aaparatad by comm or th. word "and". 

. (Jo.) Ha hit M. 

(Hoary and Jack) Ta ya ya ya ya. /to tha tuna of tba 
faailiar childhood taunt/ 

, ■ rvv. 

2. Anno tat Iona . C o u nt., ayataMtlcally conatructad or not, 

J'\- ara pemittad in tha traMcript, and ara .at off froa tha raat of tha taxt 
by apacial dalialtera or bracket., ftxanplai 

(Mary) /alovly/ Tb. capital of Alabaaa ia. . . 

/Mary .ticka pmcil An bar aar./ 



if 




3. Basic text units* In the scheme described here, whatever 



is not annotation or actor designation is taken as basic description date. 
For most applications, it is expected that this data will be some 
representation of utterances, though it might be interpreted as overt 
behavioral descriptions In some special language* 

a» Basic text words . Any string bounded by blanks or 
defined punctuation, which does not lie within annotation or actor 
designation delimiters, is marked as a functional unit, 

b. Terminating punctuation . Symbols, or strings of symbols, 
may be defined by the investigator as constituting two kinds of punctuation 
unite, terminating and non- terminating. Both are recognised in about the 
earns way, relying on their pattern and on some conU.c conditions, 

A string of basic text units ending with a terminating punctuation, 
with possibly e quotation cherecter following it, make up the higher-order 
data- type sentence . Annotation* may occur within sentences, or between 
them, but may not occur within e first-order unit, such as e basic text 
v word. The date- type "sentence" may receive various interpretations. It 

■ay ba taken aa a bounded sequence of behavioral descriptors in e code 
language, for Instance, Or it may refer to e major segmenting of e stretch 
of spoken language in the usual sense. Thera la of course not any necessity 
that the "sentence" be complete in tbs grammar* book sense. 

An actor designation followed by anything, up the next actor 
* ; / designation, makes up an actor block.’ Normally, aa actor block will con- 
tain one or more sentences, end may contain annotations. 

Sentence discontinuities . Sentences end actor blocks ere the mein 
higher-order structures in transcripts. From one point of view, the 
description of e social situation le a sequence of actor blocks, end the 
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main Interest lies in the flow of activity from one actor to another. 

In education research , when one is Interested primarily in the interactive 
aspects of the events, one is likely to focus on patterns of actor blocks , 
and to approach finer analyses involving sentences and words within 
sentences# in this context. When one is primarily Interested in language 
behavior, however, many analyses will be based on sentences. 

If it were the case that sentence boundaries always coincided 
with actor block boundaries, retrieval and analysis would be simpler than 
it is. However, people Interrupt one another, and sentences are sometimes 
finished after someone else has said something. (In describing simultaneous 
events, sentence overlap of actor blocks is sometimes convenient, even 
though there is no interruption in the usual sense.) For this reason, 
sentence and actor block structure, and retrlsval, srs not strictly 
hlsrarchlcal in ACTS. 

Annotations . The treatment of annotations in ACTS constitutes 
one of the main areas of the system's flexibility in meeting the needs of 
investigators. • Working with transcripts describing ths events in naturally 
occurring situations (in fact, in public school classrooms) , it has turned 
out to be very convenient to Include annotations in ths text, either for 
incidental information which will not receive formal analysis but which 
helps to make the descriptions more understandable, or for information that 
may be subject to more systematic treatment later, but that is not strictly 
speaking a part of what was referred to above as "basic text data", such 
as transliterations of speech. If the transcript consists only of words 
which are spoken, together with actor ideotl fleet ion, it is the case, 
often enough to be annoying, that this bare record just doesn't asks much 
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sense without auxiliary Information. It is often interesting to include 
auxiliary data in the transcript, in sequence; for instance, in analyzing 
utterances, one may wish to sort one's data according to the apparent 
'target' of the communication, as well as the source or speaker; or a 
notation concerning the tone of voice of a teacher utterance may be 
interesting in relation to the verbal content of the subsequent student 
utterances. 

To accommodate such needs, ACTS allows both unsystematic annotations » 
which are Included only for purposes of clarification (or setting down 
hunches or strictly incidental observations into the record) , and 
systematic annotations (which ve will sometimes call k eyworded annotations ) 
which are prepared systematically and may be systematically retrieved as 
well. 
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An unsystematic annotation is simply a string bounded by annotation 
brackets. Systematic annotations are bounded by explicit brackets, but 
have as well a tag or keyword , which labels the annotation according to 

type. The choice of keyword, and the corresponding creation of .an 

annotation type, la left to the investigator. He has, in other words, the 
ability to define a number of kinds of data, of relevance to hie particular 
research problems and dfta characteristics. 

For example, in Figure 1, annotations with the keyword "TO" 
indicate the apparent targeta of utterances. Annotations preceded by the 
string “BD" indicate descriptions of overt behavior, and thair contents 
may be accessed in context for further processing. Having keywords in 
uppercase letters facilitates thsir identification when human beings are 
examining transcripts or parte of them (and also helps prevent errors in the 
unwitting nee of a keyword in what la meant to be an unsystematic annotation), 
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(T) Yea, Big Man, oh, /lengthened/ he's ferocious * He's fierce. 1 

(Lavoiah) Wow. 2 

(T) He nearly, he nearly tears that cage down. +1 like to stand 3 

way back from Big Man, though. Alright, [Nancy], what Is your 4 

favorite animal? 

(Nancy) The horse. /Nancy seems Indifferent./ 5 

(T) /TO John B/ You are right. /TO class/ Okay, what is this7 6 

. . % (Part of Cl...) /3IMAA/ /SYNC/ A frog. ' 7 

(Part of Class) /SIM AA/ UNIS/ A tiger. 8 

v - (T) /TO Mary B/ What 1. it, <Haryi S19>? 9 

(Mary B) 'A turtle. 10 

(T) It*, a turtle /lengthened/. Nov say all together, /SIM DD/ n 

turtle . /BSM DD/ 

(Claa.) /SIM DD/ /UNIS/ Turtle. /BSM DO/ 12 



(T) Alright. /BD T walk, to back of rooa./ Alright, which one la...# 13 
/BD T pull, down up of India./ Here it la...# /TO Bale atudent 14 
near back of claa.rooB/ What country 1. 15 



•r • . (“~) /INT/ — /apparently about an extraneous la.ua/ lb 

'.‘.'v ■ (T) thl.r Uh... [Lavorah] 7 

I."..''- J - , ■ - ' * 

' Vv S * ' .1 ■/ VUllfH 1 r * ^ 




but 1 b not necessary. The annotation "/lengthened/" is actually systematic 
since "lengthened" is one of a set of keywords specified for ACTS for this 

* r 

particular set of transcripts, which fallen together are used to systeaati- 

* $ 

* 

cally indicate prosodic characteristics of the spoken languages 

9 

An investigator may be interested in pauses, as well as some class 
of gestural behavlore, in the context of spoken language analysis. For 

<r 

his transcripts, he could define annotation keywords representing kinds 
of psuses, and a type of systematic annotation for gestural indicatora, 
which would contain restricted content 'items . 



(Teacher) Ve uh /?/ want .to 

(Teacher) We want to uh /UES/ make our lessons neatly /?/ 
Temporal overlap . One of the characteristics of almost any social 
behavior is that the events tend to overlap in time' and it may be of 
interest in an investigation to indicate -this overlap in the descriptions 
of the behavior. In the classroom, for instance, a student may start 

> c 

giving an answer before the teacher has finished the question; or the 
entire clasa may respond in unison with the teacher. Even in fairly 
orderly classrooms, more than one atudent may be apeaking at once, in an 
animated diacuaalon. 

« • ■. 

ACTS reserves two special annotations to bracket* stretches of 

* 

text representing simultaneous svsnta. Each consists of annotstion 
delimiters such as slashes, a keyword indicating the beginning or ending 

*■. V » 

of a segment which la simultaneous with some other segment , sad a two- 

character string which serves to distinguish each case of overlap. For 

* * 

example, if SIM indicates the start of ag overlap, and ESM indicates 

. * 

• • • ■ • . • 

the termination of an overlap, we night Vyei 



(T) Now this Is a /SIM xy/ pork chop. /ESM xy/ 

(Jim) /SJ.H xy/ pork chop. /ESM xy/ 

Simultaneity annotations , which allow retrieval of all examples 
of overlap , together with the possibility that sentences may span more 
than one actor block, together with other annotations which may be defined 
allow the depiction of the temporal sequencing of events in a not strictly 
sequential fashion* Since the temporal sequencing and overlap of behavior 
is a basic part of interaction data, ACTS thus provides facilities for 
representing in computer structures this aspect of the basic structure of 
the described events. In work that we have done, the simultaneity 
annotations, two annotations for unison and asynchronous group responses, 
an annotation to indicate an interruption, and a special terminating 

t 

punctuation unit which indicates sentence "left hanging", allow represen- 
tation of what seem to be the or phenouena of thi. .ort. 

Cutting to th* PiU 

Fro* th« point of view of th. ueer, there are two Min way. of 
accaa.lng aoM part of tha coaputar rapraaantation of a tranacripti by 
location, and by contact. 

Th* flrat, accaa. by location! ia fairly atraightforward. Working 
fro* a printout of a tranacript, with actor block, and aantancaa numbered, 
th* inveatigator .pacific, entry into tha data rapraaantation at actor 
block nu*bar 54, for inatanca, or tha fifth word in aentenc* 528. Though 

it la aiaplai' thi* aort of accaaa **y b* uaaful when tranacript. are being 

* # 

exanlned caraf ully, acrutlnisad carefully, and parta of the* extracted for 
further procaaaing uaing criteria which era aubj active. Thi* kind of 
entry la uaaful a* wall w h a* aattlng up director lea which reflect com 
Judged 'apiaodlc bracket ting* of the text (barker and Wright, 1956). 



In accessing parts of the text by content, the real advantages of 
computer retrieval become manifest. Currently, several kinds of content 
may be used for retrieval purposes in ACTS. A transcript may be entered by 
(1) actor designation, (2) basic text word, or (3) type of annotation 
and (4) optionally content of annotation. For inatance, one may wish to 
access all utterances belonging to male students. Or one may wish to 
find all aentancea containing a personal pronoun. 

Once a basic text file is entered, by either of the above methods, 
examination can procaad backwards or forwards in the file, in what might 
ba called locally sequential procaaalng. Or, one may wish to extract a 
given content item in its immedlats context, and go on to the next 
occurrence of that content item. 

Access by location la mada possible in an efficient way by index 
filee which give the location in computer etorage of each major higher- 
order unit, in sequence. Access by content is made possible by what ara 
called inverted index filea, which for example, contain a list of sach 
diatinct baaic text word, togather with a list of locations in the text 
file of that word. 

Immediate analysis aide . One consequence of the existence of 

inverted index files la that frequency counts of text words, annotations, 

/ 

and the occurrence of actor blocks associated with particular actors, 
etc., are available, and do not have to be computed separately. A simple 
listing of words occurring in a class hour, with their frequency of . 
occurrence, may be of eome interest. If much use ie made of systematic 

t * 

annotations, freque n cy statistics on their occurrence may constitute 
eubet s n tlve results. Thus, soma "analysis" is provided automatically by 



Another consequence Is that the Investigator has a flexible tool 
for selective perusal of his data. In trying to understand classroom 
Interaction (or Interaction on the playground or In the home), simply being 
able to examine, for Instance, parts of the data with similar surface 
content may lead to lnalghta. The value of Ray Words In Context (KWIC) 
arrangement of textual data Is well known; ACTS provides selective con- 
text .al examination with the ability to specify conditions on what is 
printed out (for Instance, one may be interested only In teacher utterances 
containing student names; or passages which contain narked interruptions). 

Linkages . Index files, or maps of the data, allow the association 
of files of coded judgement a with various parts of the data, at various 
levela. For Instance, one may be Interested in coding attributes of actor 
blocks for one kind of analysis; and coding attributes of sentences or 
worda for other purposes. Moat of the work summarised In Simon and Boyer 
(1970) involves ma kin g judgements about events at some level of molarity. 

If an Investigator la working with transcripts, it would appear to be useful 
to storage judgements about actor blocks and sentences, and perhaps words as 
well. Certainly In any sophisticated analysis system, surface text aloes 
la not enough, given our present understanding of both Interaction and 
language, and the ability to link strings of attributes and their values 
to basic descriptions should be useful. ' 

Varieties of , 

In the above, we have used examples, and based our discussion, - 
primarily on annotated transcriptions of utterances. Other kinds of texts 
may be handled by ACTS as well. 



For instance, one nay work with what sight be called a "coopreeeed" 
transcript, which relies heavily on paraphrase. Extensive notes taken 
during the observation of a classroom, marked for date, class, teacher, 
etc. , may be stored an transcripts; and access by annotation- type or by 
word may be useful for later consultation of the notes, especially if the 
amount of them ie site able. Fox instance, retrieval by student name may 
be interesting. 

Another kind of transcript has as its basic text not translit- 
erations of utterances, but behavior descriptions. These may be in some 
behavioral deacriptlon language, with a structure and semantics less 
complicated than that of ordinary English, or even consist of a string of 
codes. Formally, a string of code oymbole in sequence, marked as to 
actor, and bounded by "punctuation", may have the same characteristics as 
a string of words reflecting verbal behavior. 

The existence of an index file system allows not only coded 
attributes to be linked to basic text files, but also allows other text. 
Soekins and Johns protocols (1963), for instance, separate behavioral 
descriptions and utterances. One might similarly separate descriptions sad 
interpretations added after the fact, or sentences and canonical para- 
phrases, for instance. 

Current Statue and Prospects 

ACTS may still be described as a system in the process of developm en t, 

i 

though its basic facilities are in use. Data structuring facilities of 
somewhat more complexity than those described here are being worked with, . 
and are convenient la some applications involving fine-grained language data. 
These more elaborate facilities are also so mew ha t more complicated to use, 
end expensive in terms of computation. V \ 



Another area of development la facilities for flexible retrieval. 
Basic access mechanisms exist, but since no user-oriented retrieval language 
has been Implemented, they are probably lacking somewhat froa the point of 
view of the Investigator who has only a nodding acquaintance with computing. 

In developing any date-handling system, needs of the people who 
will use the system are very important. The kinds of data they will 
enter Into the system, and how they want to get at the data and analyze 
it, have strong implications for a number of technical aspects of the 
system ss It Is developed. It Is tempting to assume that one knows what 
other people went, or need, but often It Is not the case that one In fact 

/ 

does. 

For this reason, we are most Interested In learning more about the 

kinds of data, and kinds of demands which might be placed on data, from 

% 

persons who ere working with transcriptions or other descriptions of social 
Interaction, verbal or otherwise. 

Summary 

A computer-based system, ACTS, for handling transcript data In 
studies of social interaction, has been described, sad some uses of the 
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system for retrieving data and structuring it for further analysis have 
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